Significant Pattern Mining on Continuous Variables

نویسندگان

  • Mahito Sugiyama
  • Karsten M. Borgwardt
چکیده

We present an efficient feature selection method that can find all multiplicative combinations of continuous features that are statistically significantly associated with the class variable, while rigorously correcting for multiple testing. The key to overcome the combinatorial explosion in the number of candidates is to derive a lower bound on the p-value for each feature combination, which enables us to massively prune combinations that can never be significant and gain more statistical power. While this problem has been addressed for binary features in the past, we here present the first solution for continuous features. In our experiments, our novel approach detects true feature combinations with higher precision and recall than competing methods that require a prior binarization of the data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Regional Co-location Patterns for Sets of Continuous Variables

This paper proposes a novel framework for mining regional colocation patterns with respect to sets of continuous variables in spatial datasets. The goal is to identify regions in which multiple continuous variables with values from the wings of their statistical distribution are co-located. A co-location mining framework is introduced that operates in the continuous domain without the need for ...

متن کامل

Quantization of Continuous Data for Pattern Based Rule Extraction

A great deal of interesting real-world data is encountered through the analysis of continuous variables, however many of the robust tools for rule discovery and data characterization depend upon the underlying data existing in an ordinal, enumerable or discrete data domain. Tools that fall into this category include much of the current work in fuzzy logic and rough sets, as well as all forms of...

متن کامل

Discovering Regional Co-location Patterns for Sets of Continuous Variables in Spatial Datasets

This paper proposes a novel framework for mining regional co-location patterns with respect to sets of continuous variables in spatial datasets. The goal is to identify regions in which multiple continuous variables with values from the wings of their statistical distribution are co-located. A co-location mining framework is introduced that operates in the continuous domain without and which vi...

متن کامل

A global optimal algorithm for class-dependent discretization of continuous data

This paper presents a new method to convert continuous variables into discrete variables for inductive machine learning. The method can be applied to pattern classification problems in machine learning and data mining. The discretization process is formulated as an optimization problem. We first use the normalized mutual information that measures the interdependence between the class labels and...

متن کامل

The WM method completed: a flexible fuzzy system approach to data mining

In this paper, the so-called Wang–Mendel (WM) method for generating fuzzy rules from data is enhanced to make it a comprehensive and flexible fuzzy system approach to data description and prediction. In the description part, the core ideas of the WM method are used to develop three methods to extract fuzzy IF–THEN rules from data. The first method shows how to extract rules for the user-specife...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1702.08694  شماره 

صفحات  -

تاریخ انتشار 2017